Method, system, and computer product for collecting and distributing clinical data for data mining

ABSTRACT

A method for collecting and distributing clinical data for data mining. The method comprises selecting a local report for collection, where the local report is a structured reporting object and includes a local clinical code, patient identification data and report collections status data. The local report is sanitized by removing the patient identification data. The local report collection status is updated to reflect the selecting. The local report is mapped to a common format report and the mapping includes: accessing a knowledge base that includes the local clinical code and a corresponding common format clinical code; and replacing the local clinical code with the corresponding common format code. The common format report is then transmitted to a data mining host system that includes a data repository.

BACKGROUND OF INVENTION

[0001] The present disclosure relates generally to a method forcollecting and distributing clinical data for data mining and inparticular, to a method for collecting and distributing coded clinicaldata that has been created in different hospitals using differentclinical data codes and formats.

[0002] Hospitals typically utilize computer systems to manage thevarious departments within the hospital. Data about each patient iscollected by a variety of computer systems. For example, a patient maybe admitted to the hospital for a Transthoracic Echo (TTE). Informationabout the patient (e.g., demographics and insurance) could be obtainedby the hospital information system (HIS) and stored on a patient record.This information could then be passed to the cardiology departmentsystem (commonly known as the cardio vascular information system, orCVIS). Typically the CVIS is a product of one company, while the HIS isthe product of another company. As a result, the database between thetwo will be different. Further, they will capture/retain and senddifferent levels of granularity in the data. Once the patientinformation has been received by the CVIS, the patient can be scheduledfor a TTE in the echo lab. Next, the TTE is performed by thesonographer. Images and measurements are taken and sent to the CVISserver. The reading physician (e.g., an echocardiographer) sits down ata review station and pulls the patient's TTE study. Theechocardiographer then begins to review the images and measurements andcreates a complete medical report on the study. The medical report canthen be coded as a structured report (SR) document including clinicaldata codes describing the contents of the report. When theechocardiographer completes the medical report, the report is sent tothe CVIS server where it is stored and associated with the patientthrough patient identification data. This completed medical report withclinical data codes is an example of the kind of report that could besent to a data repository for data mining.

[0003] The ability to create a medical record data repository containingmedical data from more than one hospital for use in data mining facesseveral challenges. First, it is difficult to get consistent clinicaldata that can be compared across hospitals and within hospitals. Forexample, Doctor W may abbreviate; Doctor X may standardize his reportsto capture only a particular set of information that he is interestedin; Doctor Y may inflate his reports to includes massive amounts of datathat may irrelevant to the given procedure; and Doctor Z may usesoftware that only reports on some information that is searchable, whilethe rest of his report is unsearchable text. Another challenge affectingthe ability to perform medical data mining has to do with obtaining theclinical data in a consistent fashion. For example, Hospital A may usesoftware that records some information that is searchable, leaving therest as unsearchable text; Hospital B my utilize dictation that producesdifficult to search data; and Hospital C may use software that recordsthe same information as the software used at Hospital A, but thelabeling of the data collected is different such that a user of the datawould have to interpret both labels to know what data has been recorded.

[0004] A further challenge faced in attempting to create a commonmedical data repository for use in data mining is the difficulty incollecting the clinical data to be entered into the repository. Somedata mining software defines specific, in-depth data to be collected butthis can require users to re-enter the same data that has already beenreported on. This can be too cumbersome and lengthy for the physician,technician or nurse to use. Even if the users are willing to re-entersome of the data, only a subset of the data can be obtained through thisprocess.

SUMMARY OF INVENTION

[0005] One aspect of the invention is a method for collecting anddistributing clinical data for data mining. The method comprisesselecting a local report for collection, where the local report is astructured reporting (SR) object and includes a local clinical code,patient identification data and report collections status data. Thelocal report is sanitized by removing the patient identification data.The local report collection status is updated to reflect the selecting.The local report is mapped to a common format report and the mappingincludes: accessing a knowledge base that includes the local clinicalcode and a corresponding common format clinical code; and replacing thelocal clinical code with the corresponding common format code. Thecommon format report is then transmitted to a data mining host systemthat includes a data repository.

[0006] Another aspect of the invention is a method for collecting anddistributing clinical data for data mining. The method comprisesreceiving a common format report at a data mining host system, where thecommon format report is an SR object and includes clinical coded medicaldata. The data mining host system includes a data repository. Thecontents of the common format report are mapped to fields in the datarepository using an XML schema. The contents of the common format reportare added to the data repository responsive to the mapping.

[0007] Another aspect of the invention is a system for collecting anddistributing clinical data for data mining. The system comprises anetwork and a hospital computer system in communication with thenetwork. The hospital computer system includes software to implement amethod. The method comprises selecting a local report located on thehospital computer system for collection, where the local report is an SRobject and includes a local clinical code, patient identification dataand report collections status data. The local report is sanitized byremoving the patient identification data. The local report collectionstatus is updated to reflect the selecting. The local report is mappedto a common format report and the mapping includes: accessing aknowledge base located on the hospital computer system that includes thelocal clinical code and a corresponding common format clinical code; andreplacing the local clinical code with the corresponding common formatcode. The common format report is then transmitted to a data mining hostsystem that includes a data repository over the network.

[0008] A further aspect of the invention is a system for collecting anddistributing clinical data for data mining. The system comprises anetwork and a storage device including a data repository. The systemalso comprises a data mining host system in communication with thenetwork and the storage device. The data mining host system includesoftware to implement a method comprising receiving a common formatreport via the network at a data repository located on the storagedevice, where the common format report is an SR object and includesclinical coded medical data. The contents of the common format reportare mapped to fields in the data repository using an XML schema. Thecontents of the common format report are added to the data repositoryresponsive to the mapping.

[0009] A further aspect of the invention is a computer program productfor collecting and distributing clinical data for data mining. Thecomputer program product comprises a storage medium readable by aprocessing circuit and storing instructions for execution by theprocessing circuit for implementing a method. The method comprisesselecting a local report for collection, where the local report is an SRobject and includes a local clinical code, patient identification dataand report collections status data. The local report is sanitized byremoving the patient identification data. The local report collectionstatus is updated to reflect the selecting. The local report is mappedto a common format report and the mapping includes: accessing aknowledge base that includes the local clinical code and a correspondingcommon format clinical code; and replacing the local clinical code withthe corresponding common format code. The common format report is thentransmitted to a data mining host system that includes a datarepository.

[0010] Further aspects of the invention are disclosed herein. The abovediscussed and other features and advantages of the present inventionwill be appreciated and understood by those skilled in the art from thefollowing detailed description and drawings.

BRIEF DESCRIPTION OF DRAWINGS

[0011] Referring to the exemplary drawings wherein like elements arenumbered alike in the several Figures:

[0012]FIG. 1 is an exemplary system for collecting and distributingclinical data for data mining;

[0013]FIG. 2 depicts an exemplary transformation of the clinical reportsinto a common format for the data repository; and

[0014]FIG. 3 is an exemplary process for collecting and formatting areport for use in public mining.

DETAILED DESCRIPTION

[0015] An embodiment of the present invention provides an infrastructureand process for collecting clinical data. Clinical data that has beenstored as a structured reporting (SR) object (e.g., DICOM SR, HL-7 andXML) is collected for use in data mining. An embodiment of the presentinvention utilizes the emerging DICOM SR standard along with other SRstandards thereby removing the need for data re-entry. The clinical datathat is collected for data mining is already stored in a defined andknown format within each hospital. This hospital specific clinical datais then transformed into a common format and stored in a data repositoryfor use in data mining. An embodiment of the present invention alsoestablishes an infrastructure for the, collection of these standard SRobjects. As part of this infrastructure the clinical data being sent tothe data repository is sanitized so that patient identity can remainanonymous. In addition, the clinical data records, or reports, arecollected without duplication and without workflow inhibition. Anembodiment of the present invention also establishes an infrastructurefor distributing the SR objects, or providing data mining on the SRobjects, using a web-based interface for creating user defined searches.In addition, bundles of sanitized medical reports can be provided tocompanies to do their own data mining research.

[0016]FIG. 1 is an exemplary system for collecting and distributingclinical data for data mining. Hospital computer systems 110 located atvarious hospitals are connected to a network 106. The hospital computersystems 110 send medical data to a data repository located on a storagedevice 108 connected to the data mining host system 104. The hospitalcomputer systems 110 typically include application software to performcoded clinical reporting along with one or more storage device forstoring the coded clinical reporting data as SR objects. In addition,the hospital computer systems include application software to transformthe clinical reporting data stored as SR objects from the hospitalformat into the common format. The SR objects in the common format arethen deposited in the data repository on the storage device 108 for usein data mining. FIG. 2, described below, depicts an exemplarytransformation of the clinical reports into a common format for the datarepository.

[0017] The system of FIG. 1 includes one or more users system 102through which an end-user, or customer, can make a request to anapplication program on the data mining host system 104 to accessparticular records stored in the data repository located on the storagedevice 108. In an exemplary embodiment, customers can include researchhospital staff members, pharmaceutical company research team members andpersonnel from companies that make medical products. In an exemplaryembodiment, the data mining host system 104 executes programs thatprovide access to data contained in the data repository located on thestorage device 108. The user systems 102 can be directly connected tothe data mining host system 104 or they could be coupled to the datamining host system 104 via the network, 106. Each user system 102 may beimplemented using a general-purpose computer executing a computerprogram for carrying out the processes described herein. The usersystems 102 may be personal computers or host attached terminals. If theuser systems 102 are personal computers, the processing described hereinmay be shared by a user system 102 and the data mining host system 104by providing an applet to the user system 102.

[0018] The network 106 may be any type of known network including alocal area network (LAN), a wide area network (WAN), an intranet, or aglobal network (e.g., Internet). A user system 102 may be coupled to thedata mining host system 104 through multiple networks (e.g., intranetand Internet) so that not all user systems 102 are required to becoupled to the data mining host system 104 through the same network. Oneor more of the user systems 102 and the data mining host system 104 maybe connected to the network 106 in a wireless fashion and the network106 may be a wireless network. In an exemplary embodiment, the network106 is the Internet and each user system 102 executes a user interfaceapplication to directly connect to the data mining host system 104. Inanother embodiment, a user system 102 may execute a web browser tocontact the data mining host system 104 through the network 106.Alternatively, a user system 102 may be implemented using a deviceprogrammed primarily for accessing the network 106 such as WebTV.

[0019] The data mining host system 104 may be implemented using a serveroperating in response to a computer program stored in a storage mediumaccessible by the server. The data mining host system 104 may operate asa network server (often referred to as a web server) to communicate withthe user systems 102. The data mining host system 104 handles sendingand receiving information to and from user systems 102 and hospitalcomputer systems 110 and can perform associated tasks. The data mininghost system 104 may also include a firewall to prevent unauthorizedaccess to the data mining host system 104 and enforce any limitations onauthorized access. For instance, an administrator may have access to theentire system and have authority to modify portions of the system and acustomer may only have access to view a subset of the data repositoryrecords for particular products. In an exemplary embodiment, theadministrator has the ability to add new users, delete users and edituser privileges. The firewall may be implemented using conventionalhardware and/or software as is known in the art.

[0020] The data mining host system 104 also operates as an applicationserver. The data mining host system 104 executes one or more applicationprograms to provide access to the data repository located on a storagedevice 108, as well as application programs to receive SR objects and tobuild a data repository for data mining. Processing may be shared by theuser system 102 and the data mining host system 104 by providing anapplication (e.g., java applet) to the user system 102. Alternatively,the user system 102 can include a stand-alone software application forperforming a portion of the processing described herein. Similarly,processing may be shared by the hospital computer system 110 and thedata mining host system 104 by providing an application to the hospitalcomputer system 110 and alternatively, the hospital computer system 110can include a stand-alone software application for performing a portionof the processing described herein. It is understood that separateservers may be used to implement the network server functions and theapplication server functions. Alternatively, the network server,firewall and the application server can be implemented by a singleserver executing computer programs to perform the requisite functions.

[0021] The storage device 108 may be implemented using a variety ofdevices for storing electronic information such as a file transferprotocol (FTP) server. It is understood that the storage device 108 maybe implemented using memory contained in the data mining host system 104or it may be a separate physical device. The storage device 108 containsa variety of information including a data repository containing medicalreports from one or more hospitals in a common format (e.g., using thesame clinical codes) and a schema describing the common format anddatabase layout. The data mining host system 104 may also operate as adatabase server and coordinate access to application data including datastored on the storage device 108. The data repository can be physicallystored as a single database with access restricted based on usercharacteristics or it can be physically stored in a variety of databasesincluding portions of the database on the user systems 102 or the datamining host system 104. In an exemplary embodiment, the data repositoryis implemented using a relational database system and the databasesystem provides different views of the data to different customers basedon customer characteristics.

[0022] In an exemplary embodiment of the present invention, clinicaldata coded using different knowledge bases is transformed to acommon/canonical representation.

[0023] Hospitals and clinicians use tools for generating clinicalreports that depict an outcome of a visit by a patient. These tools useclinical codes to represent, clinical terminology and the clinical codescan differ from one tool to another and from one hospital to another. Inaddition, the type of SR object used to store the clinical data maydiffer from one tool to another and from one hospital to another. Anexemplary embodiment of the present invention can be utilized to map theclinical reports into a common representation (e.g., common clinicalcodes and type of SR object). This can provide the ability to query ormine data generated by different clinics and hospitals in a consistentmanner and discern patterns that are useful to customers such asresearchers and clinicians.

[0024]FIG. 2 depicts an exemplary transformation of clinical reportsstored as SR objects using hospital specific clinical codes into acommon format for the data repository. First, in order to map hospitalreports to a common representation, the codes for the clinical terms arecategorized and defined. An XML schema can be utilized to define thecodes and the format. The clinical codes and categories can either begenerated internally or can be based on standards (e.g., SNOMED).Hospital database 202 in FIG. 2 includes data that is stored on ahospital computer system 110 for Hospital A. It includes a local XMLschema knowledge base that is utilized to map the local clinical codesand local formats used at Hospital A into a common report format used bythe data repository 210. For example, Hospital A may use the code“743.18” to represent a transthoracic echo procedure and the common XMLschema knowledge base may use the code “600” to represent atransthoracic echo procedure. Part of the transformation into the commonformat report would be replacing the local code “743.18” used atHospital A with the common code “600” utilized by the data repository.Similarly, hospital database 202 includes a local XML schema knowledgebase for Hospital B and a clinical report for Hospital B that will betransformed into a common format report with common clinical data codes.Common database 206 located on the data mining host system 104 includesa common XML schema knowledge base for use by the transformation engine208. In addition, the common database 206 is utilized to create thelocal XML schema knowledge bases for individual hospitals because itcontains a superset of all clinical codes. Along with clinical codetranslations and SR object translations, the common XML schema knowledgebase also includes data repository layout information that determineshow the common format reports are represented in the data repository210.

[0025] As shown in FIG. 2, the local clinical report from Hospital A inhospital database 202 is mapped to the common format report, as definedby the common XML schema knowledge base located on the common database206 by utilizing a transformation engine 208 such as the ExtensibleStyle Language Transformation (XSLT) tool. XSLT is a programminglanguage that can be used to specify transformation rules from one XMLdocument to another XML document. These transformations are specific tothe knowledge base used in generating a clinical report. XSLT is thelanguage used in XSL style sheets to transform XML documents into otherXML documents. In an exemplary embodiment of the present invention, anXSL processor reads the XML document and follows the instructions in theXSL style sheet and then it outputs a new XML document or XML documentfragment. Therefore, as shown in FIG. 2, there would be one set ofmapping rules, or local XML schema knowledge base, for every localknowledge base. These rules can be built by the vendor who is supplyingthe knowledge base for the report generator or by anyone who understandsthe underlying knowledge base.

[0026] The replacement of the local clinical codes with the commonclinical codes in order to create a common format report is performed bysoftware located at the hospital computer systems 110 using the localschema knowledge base contained in the hospital database 202. Theresulting common format report, containing common clinical codes is thentransmitted to the data mining host system 104 to be input to thetransformation engine 208 along with the associated local XML schemadatabase located on the hospital database 202 and the common XML schemaknowledge base located on the common database 206. The output from thetransformation engine 208 is the common format report formatted for thedata repository located on the storage device 108. In an alternateexemplary embodiment, all or portions of the transformation engine 208functions and associated input is located on the hospital computersystem 110 and the common format report formatted for the datarepository is transmitted to the data mining host system 104 for inputto the data repository 210. A single hospital can have more than onelocal XML schema knowledge base if it utilizes different systems withdifferent formats for clinical reporting. The mapping rules do not needto be expressed as XML schema knowledge bases but could be described inany manner known in the art.

[0027]FIG. 3 is an exemplary process for collecting and formatting areport for use in public mining. The process would typically occur atthe hospitals on the hospital computer systems 110 with the output beingsent to the data mining host system 104. At step 302, a report stored onthe hospital computer system 110 is selected to be sent to the datarepository. The report includes a report collected field that willcontain the value “no” if the report has not previously been sent andthe value “yes” if it has been sent. This avoids the problem ofduplicate reports in the data repository because only reports thatcontain the value “no” in the report collected field will be sent to thedata repository. At step 304, the report is sanitized to remove any dataitems that could be used to track the medical report back to aparticular patient. Removing the data items can include one or both of:replacing the removed data item with an alias for use in groupingreports relating to the same patient; and removing the data item andleaving the data item blank. At step 306, the report collected field isset to “yes” to avoid sending this report to the data repository morethan once. Then, at step 308, the report is mapped to the common format,as described in reference to FIG. 2. At step 310, the report that hasbeen mapped into a common format is sent to the data repository for usein data mining. This process of collecting can be performed by off-lineadministrative transport via regular web based submissions or physicalpostal service to avoid interfering with workflow.

[0028] An embodiment of the present invention allows for clinical datareports that have been created using different coding schemes anddifferent formats to be collected for use in data mining. The-reportsfrom each hospital are transformed from a local format into a commonformat that can be entered into a common data repository for use in datamining. This can be accomplished without requiring any data re-entry byrelying on the emerging DICOM SR standard along with the other SRstandards-that the hospitals utilize to store clinical data. The abilityto query a consolidated data repository for clinical data reports canlead to better patient care and more information for medical device andpharmaceutical companies. For example, a stent developing company coulduse the data to find hidden design problems with their stents and theycould use the data to create new stent designs. A pharmaceutical companycould use the data repository to make correlations between theirmedications and procedures performed. In addition, medical researchescould use the data to make better procedures for handling complexmedical problems and insurance companies could get more detailedclinical information faster to determine better coverage offerings.

[0029] As described above, the embodiments of the invention may beembodied in the form of computer-implemented processes and apparatusesfor practicing those processes. Embodiments of the invention may also beembodied in the form of computer program code containing instructionsembodied in tangible media, such as floppy diskettes, CD-ROMs, harddrives, or any other computer-readable storage medium, wherein, when thecomputer program code is loaded into and executed by a computer, thecomputer becomes an apparatus for practicing the invention. Anembodiment of the present invention can also be embodied in the form ofcomputer program code, for example, whether stored in a storage medium,loaded into and/or executed by a computer, or transmitted over sometransmission medium, such as over electrical wiring or cabling, throughfiber optics, or via electromagnetic radiation, wherein, when thecomputer program code is loaded into and executed by a computer, thecomputer becomes an apparatus for practicing the invention. Whenimplemented on a general-purpose microprocessor, the computer programcode segments configure the microprocessor to create specific logiccircuits.

[0030] While the invention has been described with reference toexemplary embodiments, it will be understood by those skilled in the artthat various changes may be made and equivalents may be substituted forelements thereof without departing from the scope of the invention. Inaddition, many modifications may be made to adapt a particular situationor material to the teachings of the invention without departing from theessential scope thereof. Therefore, it is intended that the inventionnot be limited to the particular embodiment disclosed as the best modecontemplated for carrying out this invention, but that the inventionwill include all embodiments falling within the scope of the appendedclaims. Moreover, the use of the terms first, second, etc. do not denoteany order or importance, but rather the terms first, second, etc. areused to distinguish one element from another.

1. A method for collecting and distributing clinical data for datamining, the method comprising: selecting a local report for collection,wherein said local report is a structured reporting object and includesa local clinical code, patient identification data and report collectionstatus; sanitizing said local report by removing said patientidentification data; updating said report collection status to reflectsaid selecting; mapping said local report to a common format report,wherein said mapping includes: accessing a knowledge base that includessaid local clinical code and a corresponding common format clinicalcode; and replacing said local clinical code with said correspondingcommon format clinical code; and transmitting said common format reportto a data mining host system, wherein said data mining host systemincludes a data repository.
 2. The method of claim 1 further comprising:receiving said common format report at said data mining host system;mapping the contents of said common format report to fields in said datarepository; and adding the contents of said common format report to saiddata repository responsive to said mapping the contents of said commonformat report.
 3. The method of claim 1 further comprising: receiving arequest from a customer to access data contained in said datarepository, wherein said request includes a data query; verifying thatsaid customer has access to said data; and transmitting said data tosaid customer in response to said verifying.
 4. The method of claim 1wherein said customer is a research hospital staff member.
 5. The methodof claim 1 wherein said customer is a pharmaceutical company researchteam member.
 6. The method of claim 1 wherein said customer is from acorporation that makes medical devices.
 7. The method of claim 1 whereinsaid structured reporting object is an XML object.
 8. The method ofclaim 1 wherein said structured reporting object is a DICOM structuredreporting object.
 9. The method of claim 1 wherein said structuredreporting object is an HL-7 message object.
 10. A method for collectingand distributing clinical data for data mining, the method comprising:receiving a common format report at a data mining host system, whereinsaid data mining host system includes a data repository and wherein saidcommon format report is a structured reporting object including clinicalcoded medical data; mapping the contents of said common format report tofields in said data repository using an XML schema; and adding thecontents of said common format report to said data repository responsiveto said mapping.
 11. The method of claim 10 further comprising:receiving a request from a customer to access data contained in saiddata repository, wherein said request includes a data query; verifyingthat said customer has access to said data; and transmitting said datato said customer in response to said verifying.
 12. The method of claim11 wherein said customer is a research hospital staff member.
 13. Themethod of claim 11 wherein said customer is a pharmaceutical companyresearch team member.
 14. The method of claim 11 wherein said customeris from a corporation that makes medical devices.
 15. The method ofclaim 10 wherein said structured reporting object is an XML object. 16.The method of claim 10 wherein said structured reporting object is aDICOM structured reporting object.
 17. The method of claim 10 whereinsaid structured reporting object is an HL-7 message object.
 18. A systemfor collecting and distributing clinical data for data mining, thesystem comprising: a network; and a hospital computer system incommunication with said network, said hospital computer system includingsoftware to implement the method comprising: selecting a local reportlocated on said hospital computer system for collection, wherein saidlocal report is a structured reporting object and includes a localclinical code, patient identification data and report collection status;sanitizing said local report by removing said patient identificationdata; updating said report collection status to reflect said selecting;mapping said local report to a common format report, wherein saidmapping includes: accessing a knowledge base located on said hospitalcomputer system that includes said local clinical code and acorresponding common format clinical code; and replacing said localclinical code with said corresponding common format clinical code; andtransmitting said common format report over said network to a datamining host system, wherein said data mining host system includes a datarepository.
 19. The system of claim 18 wherein said network is anInternet.
 20. The system of claim 18 wherein said network is anintranet.
 21. A system for collecting and distributing clinical data fordata mining, the system comprising: a network; a storage deviceincluding a data repository; and a data mining host system incommunication with said network and said storage device, said datamining host system including software to implement the methodcomprising: receiving a common format report via said network at saiddata mining host system, wherein said common format report is astructured reporting object including clinical coded medical data;mapping the contents of said common format report to fields in said datarepository using an XML schema; and adding the contents of said commonformat report to said data repository responsive to said mapping. 22.The system of claim 21 wherein said network is an Internet.
 23. Thesystem of claim 21 wherein said network is an intranet.
 24. The systemof claim 21 wherein data repository is a relational database.
 25. Acomputer program product for collecting and distributing clinical datafor data mining, the product comprising: a storage medium readable by aprocessing circuit and storing instructions for execution by theprocessing circuit for: selecting a local report located on saidhospital computer system for collection, wherein said local report is astructured reporting object and includes a local clinical code, patientidentification data and report collection status; sanitizing said localreport by removing said patient identification data; updating saidreport collection status to reflect said selecting; mapping said localreport to a common format report, wherein said mapping includes:accessing a knowledge base that includes said local clinical code and acorresponding common format clinical code; and replacing said localclinical code with said corresponding common format clinical code; andtransmitting said common format report to a data mining host system,wherein said data mining host system includes a data repository.